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Abstract 

The Distributional Compositional Categorical (DisCoCat) model is a mathematical framework 
that provides compositional semantics for meanings of natural language sentences. It consists of a 
computational procedure for constructing meanings of sentences, given their grammatical structure 
in terms of compositional type-logic, and given the empirically derived meanings of their words. 
For the particular case that the meaning of words is modelled within a distributional vector space 
model, its experimental predictions, derived from real large scale data, have outperformed other em- 
pirically validated methods that could build vectors for a full sentence. This success can be attributed 
to a conceptually motivated mathematical underpinning, something which the other methods lack, 
by integrating qualitative compositional type-logic and quantitative modelling of meaning within a 
category-theoretic mathematical framework. 

The type-logic used in the DisCoCat model is Lambek's pregroup grammar. Pregroup types form 
a posetal compact closed category, which can be passed, in a functorial manner, on to the compact 
closed structure of vector spaces, linear maps and tensor product. The diagrammatic versions of the 
equational reasoning in compact closed categories can be interpreted as the flow of word meanings 
within sentences. Pregroups simplify Lambek's previous type-logic, the Lambek calculus. The latter 
and its extensions have been extensively used to formalise and reason about various linguistic phe- 
nomena. Hence, the apparent reliance of the DisCoCat on pregroups has been seen as a shortcoming. 
This paper addresses this concern, by pointing out that one may as well realise a functorial passage 
from the original type-logic of Lambek, a monoidal bi-closed category, to vector spaces, or to any 
other model of meaning organised within a monoidal bi-closed category. The corresponding string 
diagram calculus, due to Baez and Stay, now depicts the flow of word meanings, and also reflects the 
structure of the parse trees of the Lambek calculus. 
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1 Introduction 



Language is both empirical and compositional: we learn meanings of words by being exposed to lin- 
guistic practice, and we form sentences we've never heard before by composing words according to the 
rules of grammar. Various mathematical and formal models have sought to capture facets and aspects of 
language learning and formation. Compositional type-logical approaches lf3TTl represent sentence forma- 
tion rules based on formal syntactic analysis, using formalisms such as context free grammars ifTOl [4T1 . 
Lambek calculus 11311 , or Combinatorial Categorial Grammar ||57l . Such formal approaches to gram- 
mar align well with Frege's notion of compositionality, according to which the meaning of a sentence 
is a function of the meaning of its parts [19], but eschew the empirical nature of language, requiring 
pre-defined mathematical structures, domains and valuations to make sense. 

Orthogonal to formal logical models, empirical approaches to semantics construct representations of 
individual words based on the contexts in which they are used. These models are often referred to as 
distributional or geometric models of semantics and are sometimes considered to be in line with the 
"meaning is use" view of Wittgenstein's philosophy of language ll59l . Distributional models have been 
applied successfully to tasks such as thesaurus extraction E3l[T5l . automated essay marking 113711 . and 
other semantically motivated natural language processing tasks. While these models reflect the empirical 
aspects of language learning that type-logical models lack, they in turn lack composition operations 
which would allow us to learn meanings of phrases based on the meanings of their parts. Developing 
models that could combine the strengths of the above two approaches has proved to be a challenge for 
computational linguistics and its applications to natural language processing (NLP). 

The distributional compositional categorical (DisCoCat) model of meaning, developed in |[T3l[T4l . pro- 
vides a solution to the above problem. This framework, which realised the challenge proposed in ifTTl 
enables a combination of the type-logical and distributional models of meaning and resulted in a proce- 
dure for compositionally computing meaning vectors for sentences by exploiting the grammatical struc- 
ture of sentences and the meaning vectors of the words therein. The framework was inspired by the 
category-theoretic high-level framework for modelling quantum protocols |2J, were the corresponding 
string diagram calculus exposes flows of information between the systems involved in multi-system pro- 
tocols such as quantum teleportation lfT2l . The DisCoCat model has meanwhile been experimentally 
validated for natural language tasks such as word-sense disambiguation within phrases ETll22ll . 

The DisCoCat model relies on Lambek pregroups ll33l as its base type-logic. In category-theoretic 
terms, these have a (non- symmetric) compact closed structure when considering types as objects and 
type reductions as morphisms. The DisCoCat exploits the fact that finite dimensional vector spaces 
can also be organised within a compact closed category. The first and mainly technical goal of this 
paper is to stress that the choice of a compact or monoidal type-logic is not crucial to the applicability 
of the procedure. To achieve this goal, we have tweaked the distributional compositional model of 
previous work from Lambek pregroups to Lambek monoids, hence developing a vector space model for 
the meaning of natural language sentences parsed within the Lambek calculus. In this paper, we develop 
a similar homographic passage via a functor from a monoidal bi-closed category of grammatical types 
and reductions to the symmetric monoidal closed category of finite dimensional vector spaces. 

This functorial passage is another contribution of this paper and gives rise to an interesting analogy with 
Topological Quantum Field Theory (TQFT) H [5] |29]]. A TQFT is also a monoidal functor from the 
category of cobordisms into the category of vector spaces and linear maps. From the perspective of 
TQFT, our DisCoCat models form a 'Grammatical Quantum Field Theory' obtained by replacing the 
monoidal category of cobordisms in a TQFT by a certain partially ordered monoid which accounts for 
grammatical structure. This analogy of the compositional distributional model of meaning with TQFT 
was first pointed out by Louis Crane at a workshop in Oxford, August 2008. Similar to the original 
model-theoretic framework of meaning by Montague [41], this semantic framework is obtained via a 
homomorphic passage from sentence formation rules to compositions of meanings of words. However, 



2 



contrary to the Montague's model, meanings of words and sentences are expressed in terms of vectors 
and vector compositions rather than in terms of sets and set-theoretic operations. 

As a result of the compactness of Lambek pregroups, the mechanism of how meanings of words interact 
to produce meanings of sentences has a purely diagrammatic form, which admits an intuitive interpre- 
tation in terms of information flow. By information flow we mean the topology of the two-dimensional 
graphical representation of the operations that produces the meaning of sentences from the meaning of 
words. Mathematically, these are expressed in the graphical language of the particular category in which 
we model the meaning of words and sentences ll55l . a practice tracing back to Penrose's work in the 
early 1970s Il48l . that was turned into a formal discipline by Joyal and Street in the 1990s Il26ll . These 
diagrams, for the particular case of compact closed categories, were extensively exploited in the earlier 
DisCoCat models. Here we show how the clasp-string calculus of Baez and Stay [6] can be used to 
provide diagrams for information flows that arise in the Lambek monoids, which are not compact. Our 
ambition is to use this work as a starting point for providing vector space meaning for more expressive 
natural language sentences such as those parsed with Combinatorial Categorial Grammars (CCGs) or 
Lambek-Grishin calculus Il44l l8ll. The expressive powers of these grammars go beyond that of Lambek 
grammars, which are context free. 

Finally, drawing a connection with games seems appropriate in the context of this special issue. Appli- 
cation of games to interpreting and formalising natural language traces back to the 'dialogical logic' of 
Lorenz and Lorenzen |[39l who used the dialogue analogy to develop a game semantic model for formu- 
lae of intuitionistic logic. Later, a classical logic version of the theory with a model theoretic focus was 
developed by Hintikka and Sandu E4I . A proof-theoretic approach led to the use of linear logic, and 
proof nets, e.g. see IT301I471 . Independently, another line of research was pursued by linguists who also 
used the term 'dialogue games' to provide a semantic model for real-life human-computer dialogues. 
One of the original proposals of this line was based on Grice's pragmatic philosophy of language and 
used component programs and specifications to model dialogues and queries; the setting was applied 
to online sale tools Il38l . Later on, a formal model based on belief revision and Bayesian update was 
developed for this approach ll53l . Our work can be seen as bridging these two (abstract logical and ap- 
plied linguistic) communities. Our starting point is a Lambek calculus, with a proof theory similar to 
that of intuitionistic multiplicative linear logic. In this calculus, the grammatical structure of a sentence 
is represented as a derivation in a proof tree and depicted in diagrams similar to proof nets. We interpret 
these derivations in vector spaces, seen as a monoidal closed category, and depict the grammatical and 
semantic interactions via Baez-Stay diagrams. General linear logic proof nets are quite different from 
Baez-Stay diagrams, but their compact variants introduced in [3, 1] resemble the compact closed string 
diagrams used in the pregroup derivations. 

2 Partial Order Structures in Linguistics 

Application of partially ordered algebras to linguistics originated in the seminal work of Lambek in the 
50's OTIl . In his debut work, Lambek showed how a partially ordered residuated monoid can be used 
to analyze the syntactic structure of a fragment of English. He later developed a decision procedure for 
this setting, based on a cut-free sequent calculus. This calculus can be seen as the father of linear logic, 
as it shares with it a monoidal tensor, in a non-commutative form, and hence two linear implications. 
Lambek's approach was based on a partial order of grammatical types; similar ideas have been present 
in the work of Bar-Hillel [7] but were not formalized in algebraic and proof theoretic forms. About half 
a century later in the late 90s, Lambek simplified his original residuated monoids in favor of a simpler 
partial order, which he called a pregroup ||331 . Pregroups have been applied to analyse various different 
languages, for references see Il36ll . In this section we review these two structures and their application to 
natural language. 
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2.1 Lambek Monoids 

Lambek calculus IT3T1 is usually a reference to Lambek's sequent calculus; this calculus is similar to that 
of intuitionistic multiplicative bi-linear logic, but lacks negation. It has one main binary operation, which 
is noncommutative, hence has a right and a left implication. This calculus is sound and complete with 
regard to partially ordered residuated monoids. 

Recall that for two order-preserving maps / : A — > B and g : B — > A on two partially ordered sets A and 
B, we say that / is the left adjoint to, or the left residual of, g (or equivalently, g is the right adjoint to / 
or its left residual), denoted / H g, iff 

VaeA,beB, f(a) < b o a < g(b) 

The above condition is equivalent to the following: 

Vb e B, f(g(b)) <b, and Va e A, a < g(f(a)) 

Based on the above definition, a residuated monoid is defined as follows: 

Definition 2.1. A residuated monoid (L, <, •, 1, -°, °-) is a partially ordered set (L, <), equipped with a 
monoid structure (L, •, 1) that preserves the partial order, that is for all a,b,c € L, we have: 

If a < b then a - c <b • c and c • a < c ■ b 

The unit element 1 satisfies the following for all a e L: 

1 • a = a ■ 1 = a 

That the monoid is residuated means that -o and o- are the two adjoints of •, that is, a ■ (-) H a -o (-) 
and (-) • b H (-) °- b, explicitly we have: 

b<a-°c<=>a-b<c<=>a<c°-b, (1) 

or, equivalently, using the corollaries of these adjunctions, we have: 

a ■ {a -° c) < c , c < a -° {a • c) , (c °- b) • b < c , c < (c • b) o- b , (2) 

These structures are also referred to as residuated lattices in the literature. But strictly speaking, a 
residuated monoid is a residuated lattice with the exclusion of its lattice operations, hence the main 
operation of this structure is a monoid multiplication. 

We refer to a partially ordered residuated monoid as a Lambek monoid. These monoids are applied to 
the encoding of the grammatical structure of natural language, whereby elements of the algebra denote 
grammatical types, the monoid multiplication is the juxtaposition of these types, and its unit is the empty 
type. The right and left adjoints are used to denote function-types; these encode the types of the words 
that have a relational role, for example adjectives, verbs, adverbs, conjunctives, and relative pronouns. 
This application procedure is formalised via the following structures. 

Definition 2.2. For 2 the set of words of a natural language and S a set of basic grammatical types, a 
Lambek type-dictionary is a binary relation D, defined as 

Dclx T(S) 

where T(S) is the free Lambek monoid generated over S (for the free construction see OTTO . 
Definition 2.3. A Lambek grammar G is a pair (D, S ), where D is a Lambek type-dictionary and S c S 
is a set of designated types, containing types such as that of a declarative sentence s, and a question q. 
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A Lambek grammar is used to define the grammatical sentences of a language as follows. 

Definition 2.4. A string of words W1W2 • • • w n each of them from S is said to be grammatical iff for 
1 < i < ft, there exists a (w,-, tj) in the dictionary D, such that for the designated type s of a sentence in S , 
the following partial order holds in T(S): 

h-h t„ < s 



As an example, consider a simple language that contains the following words: 

E = {men, dogs, cute, kill, do, not) 

For the sake of this example suppose that in this language one can only form declarative sentences 'men 
kill dogs', 'men do not kill dogs', and 'men kill cute dogs'. A Lambek type-dictionary that generates the 
grammatical sentences of this language has the following basic types: 

B = [n,s,j,<r) 

Here, n stands for a noun phrase, s for a declarative sentence, j for the infinitive of a verb, and cr is a 



marker type. The type assignments to words of X are presented in Table 2. 1 



men 


dogs 


cute 


kill 


to kill 


do 


not 


n 


n 


n o- n 


(ft -o s) o- n 


(cr -o j)<^n 


(n -o s) o- (cr -o f) 


(cr -o j) o- (cr -o j) 



Table 1 : Type Assignments for the Toy Language £ in a Lambek monoid. 



The Lambek type-dictionary D corresponding to the type assignments of Table 2.1 is as follows: 

|(men, ft), (dogs, ri), (cute, n o- n), (kill, (n -o s) °- n), (to kill, (cr -o j) o- «), 
(do, (n ^ s)^(cr ^ j)), (not, (cr ^> j) (cr -o ;))} 

Note that the verb 'kill' has two types, represented by two pairs in the type dictionary: the first one 
(kill, (n -a s) o- ri) is for its transitive role, e.g. in the sentence 'men kill dogs' and the second one 
(kill, (cr -o j) o- n) for its infinitive role, e.g. in the sentence 'men do not kill dogs'. 



By definition 2.4 the sentence 'men kill dogs' is grammatical, since if we apply the monoid multipli- 
cation to the types that correspond to the words, we obtain the term n • (n -o s) °- n • n, which has the 
following reduction: 

n • ((n -o s) o- n) • n < 
n ■ (n -o s) < s 

The sentence 'men kill cute dogs' is also grammatical; it has the following reduction: 

n • ((n -o s) o— ri) • (n o- ri) • n < 
n ■ ((« ^> s) o- ri) ■ n < 
n ■ (n -o s) < s 

Similarly, the sentence 'men do not kill dogs' is grammatical, according to the following reduction: 

ft • ((ft ^> S) (CT -v j)) ■ ((cr ^> j) (Cr ^> j)) ■ (((T -o j) o- ft) • ft < 
ft • ((ft ^> s) (cr ^> j)) ■ ((cr ^> j) (cr ^> j)) ■ (cr ^> j) < 
n ■ ((n ^ s) <^ (cr ^> j)) ■ (cr -o j) < 
n ■ (n -° s) < s 



5 



2.1.1 Lambek Pregroups 



In 1999, Lambek revisited his monoidal structures and introduced a simplification Il33l . Instead of work- 
ing in a partially ordered residuated monoid, he argued for a pregroup, which has one non-residuated 
binary operation, but where each element of the partial order is required to have a left and a right adjoint. 
More precisely, we have: 

Definition 2.5. A Lambek pregroup is a partially ordered unital monoid where each element has a left 
and a right adjoint (P, <, •, 1, (-)', (-) r ). That is, for every p e P, there is a p r and a p 1 in P, which satisfy 
the following four inequalities: 

p • p r < 1 < p' • p 
p ■ p < 1 < p ■ p 



From this definition it follows that adjoints are unique and reverse the order, that is if we have p < q 
for p,q e P then it follows that q l < p 1 and also that q r < p r . One can also show that the unit is self 
adjoint, that is V = 1 = l', that opposite adjoints cancel out, that is (p r ) 1 = {p l ) r = p, but same adjoints 
iterate, for instance (p r ) r is not necessarily equal to p and neither is (p 1 ) 1 . Another nice property is that 
the monoid multiplication is self adjoint, that is (p ■ qf = q r ■ p r and also (p ■ q) 1 = q 1 ■ p l . These properties 
are all proved and elaborated on by Lambek [33 ]. 



Apart from linguistics, pregroups have concrete applications in other fields such as to number theory If34l . 
Notably, an example of a pregroup structure on natural numbers is the set of all unbounded monotone 
functions on the set of integers Z. Here, the partial order is the natural ordering of integers extended to 
functions, that is for f,g € Z z , we have: 

f<g iff f(n)<g(n),VneZ 

The monoid multiplication is the composition of functions and its unit is the identity function, that is for 
ueZwe have: 

(f ■ g)(n) - f{g{n)) and \{n) = n 

The left and right adjoints of each function are computed by using their canonical definitions and via the 
supremum and infimum operations on integers, again extended to functions, as follows: 



f'\n) = V{m € Z | f(m) < n\ 



f'(n) = A{m € Z | n < f(m)} 



As an example, take f(x) = 2x, then compute f(x) = [x/2j and f(x) = [(x + 1)/2J, where [x\ is the 
biggest integer less than or equal to x (for details of these computations and more examples see [34)). 



Pregroups are applied to the analysis of syntax in the same way as Lambek monoids. The types are 
translated from their monoidal implicative form to a pregroup adjoint form as follows: 



p -o q ^ p • q 



p-q 



The pregroup version of Table 2. 1 is as follows 



men 


dogs 


cute 


kill 


to kill 


do 


not 


n 


n 


n • n l 


r I 

n ■ s ■ n 


r ■ 1 

cr r ■ j-n' 


r -1 

n ■ s ■ j r ■ cr 


r ■ 4 



Table 2: Type Assignments for the Toy Language £ in a Lambek pregroup. 



The reduction corresponding to the sentence 'men kill dogs' is as follows: 

n-n r -s-n l <\-s-\=s 
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For 'men kill cute dogs', we have: 

n-n r -s-n l -n-n l -n<\-s-\-\ = s 
And for 'men do not kill dogs' we have: 

r •/ r • •/ r • I 

n - n ■ s ■ j • o~ • o~ ■ j ■ j • o~ • o~ ■ j ■ n ■ n < 
l-s-f-l-j-f-l-j-l = 

S'f'j' f ■ j < 

s-l-\ = s 

The pregroup reductions are usually depicted in cancelation diagrams using curved strings (cups), which 
connect the types that are being cancelled in each step, and lines, which depict types which have not been 
canceled. Nested strings are used to denote multiple steps. For example, the cancelation diagram of 'men 
kill dogs' is as follows ( the -'s are dropped). 

men kill dogs. 

n n r s n l n 

\J | KJ 

The diagram for the grammatical structure of 'men do not kill dogs' is as follows: 

men do not kill dogs. 

n n r sfcr cr r j fa a r j n 1 n 



These diagrams provide an intuitive reading of the grammatical structure of the sentence. For instance, 
in the first diagram we read that for a transitive sentence to be grammatical, the verb has to interact with 
its subject and object, depicted via the curved strings, hence producing a sentence via the line. 

The challenge of using type-logics to formalize natural language grammar resides in assigning the right 
basic and compound types to the words of the language such that they would generate all the grammatical 
statements of the language and do not over-generate. Different type dictionaries have been suggested for 
different languages, e.g. see ll35l 1421 . The types used in this paper are from 11501 : these are chosen to 
keep the setting simple and to be able to parse our example sentences. Parsing a more complex language 
needs a more elaborate type-dictionary. 

Additional operators, either in terms of modalities or new additive binary connectives have been added 
to type-logics to increase their expressive power P31 EUl 1431 ESI . The expressive power of Lambek 
monoids and pregroups is the same as that of context-free grammars of the Chomsky hierarchy ||49l l9l. 
The extensions with modalities and additives increase it to weak variants of context-sensitive such as 
mildly context-sensitive. In the proof-theoretic calculi of these algebras, the grammatical reductions are 
depicted via proof nets If54ll46l . which offer a richer analysis of the decomposition of types. 

The above type-logical structures do represent the grammatical structure of a language in a compositional 
way, but do not offer a model for the lexical semantics of the words of a language. In such type-logics, 
words are only modelled by their grammatical roles, hence words that have the same grammatical role 
but different lexical meanings cannot be distinguished from one another. For instance, the words 'dog' 
and 'men' are both represented only by their grammatical role n, ignoring the fact that they have different 
lexical meanings. The same problem exists for transitive verbs such as 'kill' and 'eat', adjectives such as 
'cute' and 'green', and so on. For a subspace of a real vector space, see Figure[T] 
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3 Distributional Models of Meaning 



Orthogonal to the type-logical models, distributional models of meaning are mainly concerned with 
lexical semantics. Best described by a quotation by Firth that "You shall know a word by the company it 
keeps." |[T8l . these models are based on the dictum that words that often appeal - in the same context have 
similar meanings. For example, the words 'cat' and 'dog' often appear in the context of 'pet', 'furry', 
'owner', and 'food', hence they have similar meanings. Similarly, 'kill' and 'murder' often appear in the 
context of 'police', 'gun', and 'arrest', hence these also have similar meanings. 

To formalize this idea, one builds a finite dimensional vector space N whose basis vectors are the context 
words. Ideally, the context words are all the lemmatized words of a corpus of interest. Typically, these 
are reduced to a more refined set, based on the domain of application. One then fixes a window of k 
words and builds a vector for each word, representing its lexical meaning. We denote the meaning vector 

of a word by word = c, rii, for c; a real number and n a basis vector of N. The c ; - weights are obtained 
by first counting how many times a word has appeared within k (e.g. 5) words of each context and then 
normalising this count. The most popular normalisation measure is Term-Frequency (TF) divided by 
Inverse-Document-Frequency (IDF); it assigns a degree of importance to the appearance of a word in a 
document. TF/IDF is a proportion for the number of times a word appears in the corpus to the frequency 
of the total number of words in the corpus. 

The distance between the meaning vectors, for instance the cosine of their angle, provides a good measure 
of similarity of meaning. For example, in the vector space of Figure[TJ the angle between meaning vectors 
of 'cat' and 'dog' is small and so is the angle between meaning vectors of 'kill' and 'murder'. Various 
similarity measures have been implemented on large scale data (up to a billion words) to build high 
dimensional vector spaces (tens of thousands of basis vectors). These have been successfully applied to 
automatic generation of thesauri; for example see |[T5ll . 



food 



kill 
murder 




owner 



police 

Figure 1: A Subspace of a Real Vector Space Model of Meaning 

Contrary to the type-logical models, the empirical distributional models ignore the grammatical structure 

of language. For instance, these models do not offer a canonical way of building meaning vectors for 

sentences. A compositional solution to this problem should use vector composition operators to combine 

meaning vectors of words and obtain meaning vectors for sentences. The simple operations widely 

studied in the literature, for instance in j40ll , are addition (+) and component-wise multiplication (O). 

These are commutative and hence do not even respect the word order. If vw = v + w or v O w , then 

vw - wv, leading to unwelcome equalities such as the following: 

> > 

men kill dogs = dogs kill men 

Inspired by the connectionist model of meaning in Cognitive Science ll56l . a combination of Kronecker 
product (which is non-commutative) and syntactic relations has been suggested in ifTTll as a possible 
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solution. The problem with this model is that the dimensionalities of sentence vectors differ for sentences 
with different grammatical structures, barring them from being compared. For instance, in this model 
one cannot compare meanings of the two sentences 'men kill dogs' and 'men kill', since they live in 
different spaces. 

4 Distributional Compositional Categorical Model of Meaning 

In previous work lfT3l |2T1 . we combined the pregroup type-logic with the distributional model and de- 
veloped a framework that produces vectors for meanings of sentences, from their grammatical structure 
and the vectors of the words therein. This framework, summarised below, is the Cartesian product of two 
compact closed categories: that of a pregroups with that of finite dimensional vector spaces. 

4.1 Pregroups and Vector Spaces as Compact Closed Categories 

Lambek's type-logics closely relate to abstract categorical structures |[32ll5Tll . To see this connection, we 
recall some definitions from category theory. A compact closed category has objects A, B, morphisms 
/ : A — > B, a monoidal tensor A ® B that has a unit /, and for each object A two objects A' and A 1 together 
with the following morphisms: 

A®A r — / A r ®A 
A 1 ® A — ^* / Xa^A' 

The above satisfy the following equalities, where 1 A is the identity morphism on object A: 

(1 A ® e*) o (yj ® 1 A ) = \ A (e r ® l A ) o (1 A ® 77'") = \ A 

(e 1 ® I A ) o (\ A , ® 77') = 1 A , (1 A , ® e r ) o ® l A r) = 1 A , 

These inequalities are known as the yanking equalities. Note we do not assume symmetry of the tensor. 

A pregroup is a compact closed category ll5Tll . to which we refer as Preg. The elements of the partially 
ordered set p,q e P are the objects of Preg, the partial order relation provides the morphisms, that is, 
there exists a morphism of type p — > q iff p < q; we denote this morphism by [p < q]. Compositions 
of morphisms are given by transitivity and the identities by reflexivity of the partial order. The monoid 
multiplication and its unit (denoted by 1 rather than /) provide the tensor of the category, and the four 
adjoint inequalities provide the epsilon and eta morphisms, that is we have: 

f? = [p ■ ;/ < 1] e l = [p l -p< 1] 

rf = [l<p r -p] rf = [\<p. p'] 

Finite dimensional vector spaces also form a compact closed category we refer to it as FVect. Finite 

dimensional vector spaces V, W are objects of this category; linear maps /: V — > W are its morphisms, 

and composition is the composition of linear maps. The tensor product between the spaces V ® W is the 

monoidal tensor whose unit is a field, in our case R. As opposed to the tensor of the pregroup, this tensor 

is symmetric, hence we have a naturual isomorphism V ® W = W ® V. As a result of symmetry of the 

tensor, the two adjoints collapse to one and we obtain V 1 = V r = V*, where V* is the dual of V. Since the 

basis vectors of our vector spaces are fixed, we furthermore obtain that V* = V. Finally, a vector v e V 

— > 

is represented by the morphism R >- V. 
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Given a basis {r,},- for a vector spaces V, the epsilon maps are given by the inner product extended by 
linearity, that is we have: 

e' = e r : V* ® V -» R 

: : ^ Cjj if/i®(f>j h> ^ Cij(i//i \<pj) 

ij 'j 

Eta maps are defined as follows: 

rf = n r : R -> V ® V* 

:: 1 i-» ^r ( ®r ; 

Here 1 e R is the number 1 , and the above assignment extends to all other numbers by linearity. 

A DiscoCat is the cartesian product of FVect and Preg. This is a category: its objects are pairs (W, p) 
for W a finite dimensional vector space and p a pregroup type. Its morphisms are pairs of morphisms 
(f : V — > W, [p < q]), for / a linear map and p < q a pregroup partial order. Composition is obtained in a 
pointwise fashion by composing linear maps and transitivity of the order. The identity morphism for an 
object (V, p) is a pair of identity morphisms (1 y, [p < p]) from FVect and Preg. 

Definition 4.1. The distributional compositional categorical (DisCoCat) model of meaning is the cate- 
gory FVect x Preg, equipped with a tensor product given by point wise tensor of Preg and FVect, that 
is (V, p) ® (W, q) - (V ® W, p ■ q), whose unit is (R, 1). 

It has been shown in |[T3l that the compact structure carries over from Preg and FVect to a DisCoCat, 
with epsilon and eta maps given in a pointwise fashion as follows: 

e r = (V ® V* R, [p ■ p r < 1]) e 1 = (V* ® V -> R, [p l ■ p < 1]) 

r] r = (R _» V* ® V, [1 < // • p]) rf = (R -> V® V*,[l < p • p']) 

4.2 Meaning Vectors for Strings of Words 

The pair (iv e f,p) is an object of Fvect x Preg. We use this object to represent the semantics of a word 
w and refer to it as the meaning of the word w. It consists of a vector space W where the meaning vector 
w of word w that has pregroup type p lives. Based on this notion, the meaning vector representing the 
semantics of a sentence is obtained according to the following definition. 

Definition 4.2. We define the meaning vector w\ ■ ■ - Wn of a string of words w\ ■ ■ ■ w n to be: 

w[ ■ ■ ■ w] t := f(w\ ® • • • ® wld 

where for (\V{ € Wi,pt) meaning of the word Wj, the linear map f is built by substituting each pi in the 
pregroup reduction map of the string [p\ p„ < x] with Wj. 

Thus for a - [p\ p n < x] a morphism in Preg and f - a[pj/Wi] a linear map in FVect, the 

following is a morphism in FVect X Preg: 

(Wl®---®W n , Pl Pn ) — (X,X) 

For example, to assign a meaning vector to an adjective-noun phrase, we start with the meanings of 
adjective and noun, which have the following forms: 

(adj € W ® W, n ■ n l ) (noun <= W, n) 
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Here, we are assigning the vector space W to the noun noun and assuming that the distributional meaning 
vectors of the nouns live in it, that is noun € W. For the meaning vector of the adjective, it is to assumed 
to be an element of W ® W, hence representable by Yuim ci m wi® w m , for w/, w m basis vectors of W. The 
meaning vectors of the noun and adjective are represented by the following morphisms: 



noun 11/ ™ adj 
W R — -* W®W 



The pregroup reduction map of the adjective-noun phrase is as follows: 

a - [n ■ n l • n < n] - 1„ • e n 

Substituting each type in a with the vector space associated with the words of that type, we obtain the 
linear map / corresponding to a to be the morphism (lw®ew)- The meaning vector of an adjective-noun 
phrase is computed by applying the linear map corresponding to the pregroup parse of the adjective- 
noun phrase, that is (lw ® ew) , to the tensor product of the meaning vector of the adjective adj with 
the meaning vector of the noun noun. The categorical morphism corresponding to this computation is as 
follows: 



adj noun 



= (lw ® ^wXadj ® noun) 

adj 



= (l w ®e w ) 

adj®nouri 



w®w 



W® W® W W®R 



mw tw®w®w^iw 



The concrete value of the vector corresponding to the above morphism is: 



adj noun := (lw ® € w) 



lm 



C tm Wl®W m 



® noun 



= ^ Q m wi (w m | noun) 

lm 



The meaning vector of a sentence with an adjective-noun phrase is computed by substituting the above 
for the meaning vector of the adjective-noun phrase when computing the meaning of the sentence. For 
instance, consider the sentence 'men kill dogs'; we have the following for the meaning vectors of words: 

(men € W, n), (dogs € W, n), (kill e W ® S ® W, n r ■ s ■ n l ) 

So we are assigning the vector space W to nouns and the vector space S to sentences; as a result the 
vector space of transitive verbs and in particular of 'kill' will become W ® S ® W; each verb being 
representable by Yuijk c ijk^i j ®~Wk, for ~Wi,~Wk basis vectors of W and"^- a basis vector of S . The 
pregroup reduction map of the sentence is as follows: 

a - [n • n r • s • n l • n < s] = e„ • l s • e n 

Substituting each type in a with the vector space associated with the words of that type, we obtain / to 
be the morphism ew ® Is ® € w- Given the distributional meanings of each word, that is men, kill, dogs, 
the meaning of the sentence will be: 

men kill dogs = (e w ® Is ® ew) (men ®kill® dogs] : W®W®S ®W®W £lv ^ U '^ S 
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further simplified as follows: 



men kill dogs = /(men ® kill ® dogs) 



(ew ® Is ® ew) men ® kill ® dogs 



(e w ® l s ® e w ) 



men ® 



Cijk 



® dogs 



- Cijk ' ~^i>(™ k I d °§ S ^ i 

ijk 

Now, we can compute the meaning vector of the sentence 'men kill cute dogs': the pregroup morphism 
of the grammatical reduction of this sentence is e n ■ l s • e n • e n , whose linear map will be ew ® Is ® ew ® 
ew- Applying this map to the tensor product of the meaning vectors of the words provides us with the 
following meaning vector for the sentence: 



men kill cute dogs = (ew ® Is ® ew ® ew) (men ® kill ® cute ® dogs) 
- ^ c ijk (men | vv ; ) (w* I cute(dogs)>7 y - 



ijk Im 



As an example of a more complicated sentence, consider 'men do not kill dogs'. For the meaning vectors 
of the auxiliary 'do', the infinitive 'kill', and the negation word 'not', we have the following lfT3ll52j : 

(kill e W®S ®W,a r - j-n'), (doe W®S ®S 9 W,n r ■ s- f ■&), (not e W®S ®S ®W,a r ■ j ■ f ■ cf) 

These are obtained by making the semantic assumption that the vector spaces for j and s are the same, 
that is S , and that the vector spaces for n and cr are the same, that is W. The first assumption is justified by 
the fact that transitive and infinitive transitive verbs both have the same meaning, hence live in the same 
vector space. The second assumption is justified by the fact that the cr's are place holders for the subject. 
The pregroup morphism corresponding to the grammatical reduction of the sentence is as follows: 

a = (h ■ e) ■ e}) ° « • 1, • l f ■ e r a ■ lj ■ l f ■ £ ■ lj ■ e[) 

The linear map / corresponding to the above is: 

(Is ® 6s ® e$) o (ew ® Is ® Is ® £w ® Is ® Is ® ew ® lj ® £w) 

The meaning of 'not' is generated from a linear map not : S — > S , which takes a sentence and negates it. 
The meaning of 'do' is generated from the identity I5. These are as follows: 

do := (lw ® ((Is ® Is) i]s) ® lw) tjw not := (l^ ® ((Is ® not) o 77s) ® lw) W 

The corresponding computations provide the following meaning vector for the sentence lTT3l 1521 : 

not (men kill dogs^ 
This is the negation of the meaning of the positive version of the sentence. 

So far we have not said anything about the concrete shape of W and S . Moreover, for the general setting 
to work, we need concrete vectors for words with compound types whose vectors live in tensor spaces 
and cannot be directly obtained from distributional models. For example, the verb lives in W® S ® W and 
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has to act on the subject and object, the adjective lives in W® W and has to modify the noun, the negation 
word not is a linear map that acts on the abstract space S and has to negate the meaning of a sentence and 
so on. Previous work presented some solutions for the limit truth-theoretic case, where S was the two 
dimensional space of 1 1) and 1 0>, and not was taken to be the linear map corresponding to the matrix of 
the swap gate [13]. Later, S was also taken to be a high dimensional vector spaces concretely built from 
the distributional models ll2"Tll2"2"Tl . The next section summarises these results. 

5 Concrete Spaces and Experimental Results 

In previous work, we provided an algorithm for building vectors for words with compound types and 
instantiated it for the particular case of intransitive and transitive sentences with simple subjects and 
objects GTl l22ll . Work in progress extends these cases to adjective-noun phrases in transitive sentences 
as subject and object. We evaluated the resulting vectors on a disambiguation task performed on real data 
obtained from the British National Corpus (BNC). This corpus consists of about 6 million sentences and 
500,000 unique lemmas. 

5.1 Concrete Constructions 

We build a vector space N from the BNC by taking its 2000 most occurring lemmas as basis vectors n ,■; 
this restriction is both for computational purposes and also to be able to compare our results to related 
work [40] . Vectors of N are built by counting co-occurence and normalizing by TF/IDF. We take W to 
be N and 5 to be ,/V © (N ® AO © (N ® N ® N). In the latter, the first N encodes meanings of intransitive 
sentences; these are unary relations; N ® N encodes meanings of transitive sentences, which are binary 
relations, and N ® N ® N is for meanings of ditransitive sentences, which are ternary relations. These 
relations are denoted by weighted sets of singletons, pairs, and triples, respectively. The set of singletons 
for an intransitive verb denotes the subjects to which the verb has been applied throughout the corpus. 
The set of pairs denotes the subject-object pairs that have been related by a transitive verb, etc. The 
weights represent the extent according to which the verb has acted on or related the nouns; these are 
learnt from the corpus in the following two ways. These are instantiations of a general procedure for 
building vectors for a word of any compound type ETTl . 

1. Categorical (1). The meaning vector of an intransitive verb v € N is J^ Cjra,-, where each c, is 
obtained by summing^ the vectors of all the subjects of v throughout the BNC. The meaning vector 
of a transitive verb tv e N ® N is c ij n i® n j> where each c i; - is the sum of the tensor products 
of all the subjects and objects that tv has related in the BNC, and similarly for ditransitive verbs. 

2. Categorical (2). Here we simply take the Kronecker products of the context vectors of the verbs, 
abusing the notation we still refer to it as v . So for the intransitive verb, this is the vector obtained 
by a normalized count of co-occurence v e N. For the transitive verb it is v®v€N®N and for 
the ditransitive it is v ® v ® v e N ® N ® N. 

The above vectors are embedded in the spaces prescribed for them by a DisCoCat, that is N ® S for v 
and N ® S ® N for tv, by diagonalisation, that is by padding the non-diagonal elements by zeros. 

5.2 Evaluation Task, Dataset, and Results 

We evaluated these concrete methods on a disambiguation task. The general idea behind this task is that 
some verbs have more than one meaning and the sentences in which they appear disambiguate them. 
Suppose a verb v has two meanings a and b and we want to decide whether v means a or b whenever it 
occurs in a sentence s. To implement this task, we chose 10 ambiguous transitive verbs from the most 
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frequent verbs of the BNC. For each verb, two different non-overlapping meanings were retrieved via 
the JCN measure of information content synonymy applied to WordNet synsets ll25l . For instance one 
of our chosen verbs was 'meet', for which we obtained meaning a: 'visit' and meaning b: 'satisfy'. For 
each original verb, ten sentences containing that verb (in the same role) were retrieved from the BNC; for 
example, one such sentence for v: 'meet' is s: 'the system met the criterion'. For each such sentence, we 
generated two other sentences by substituting the verbs of those sentences by a and b, respectively. For 
instance, 'the system satisfied the criterion' and 'the system visited the criterion' were generated for the 
first meaning of 'meet'. This procedure provided us with 200 pairs of sentences. Note that the generated 
sentences only make sense for the correct meaning of the verb; for instance, 'the systems satisfied the 
criterion' does make sense, whereas 'the system visited the criterion' does not. The goal is to verify that 
the sentences 'the system met the criterion' and 'the system satisfied the criterion' have a high degree 
of semantic similarity, whereas the sentences 'the system met the criterion' and 'the system visited the 
criterion' have a low degree of similarity. The result of this verification disambiguates the verb. For the 
case of 'meet', we are verifying that it means 'satisfy' (and not 'visit') in the sentence 'the system met 
the criterion'. 

We experimented with two datasets, one for intransitive sentences from [40] and an extension of it for 
transitive sentences, as described above (the extension to sentences with adjectiveOnoun phrases and 
subject and object is straightforward and preserves the results). An example of the second dataset is 
provided in Table 3. 





Sentence 1 


Sentence 2 


1 


the system met the criterion 


the system visited the criterion 


2 


the system met the criterion 


the system satisfied the criterion 


3 


the child met the house 


the child visited the house 


4 


the child met the house 


the child satisfied the house 


5 


the child showed interest 


the child pictured interest 


6 


the child showed interest 


the child expressed interest 


7 


the map showed the location 


the map pictured the location 


8 


the map showed the location 


the map expressed the location 



Table 3: Sample Sentence Pairs from the Transitive Dataset. 



We built vectors for nouns by using the usual distributional co-occurence method. Then built vectors for 
verbs using both of the methods described above, and finally built vectors for sentences using the Dis- 
CoCat prescription. After the sentence vectors were built, we measured the similarity of each pair using 
the cosine of their angles on the scale of the real numbers in [0, 1]. In order to judge the performance 
of our method, we followed guidelines from related work P01 . We distributed our data set among 25 
volunteers who were asked to rank each pair based on how similar they thought they were. The ranking 
was between 1 and 7, where 1 was almost dissimilar and 7 almost identical. To be in line with related 
work [40], each pair was also given a HIGH or LOW classification by us. However, these scores are 
solely based on our personal judgements and on their own they do not provide a very reliable measure 
of comparison. The correlation of the model's similarity judgements with the human judgements was 
calculated using Spearman's p. This is a rank correlation coefficient ranging from -1 to 1 and provides a 
more robust metric (in contrast with the LOW/HIGH metric) by which models are ranked and compared. 

The results of these calculations for our datasets are presented in Table 4. The additive and multiplicative 
rows have, as composition operation, vector addition and component-wise multiplication. The Base- 
line is from a non-compositional approach; it is obtained by comparing the verb vectors of each pair 
directly and ignoring their subjects and objects. The UpperBound is set to be inter-annotator agreement. 
According to the p measure, both of our methods outperform the other methods. 
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Model 


High 


Low 


P 


Baseline 


0.47 


0.44 


0.16 


Add 


0.90 


0.90 


0.05 


Multiply 


0.67 


0.59 


0.17 


Categorical (1) 


0.73 


0.72 


0.21 


Categorical (2) 


0.34 


0.26 


0.28 


UpperBound 


4.80 


2.49 


0.62 



Table 4: Results of the 1st and 2nd Compositional Disambiguation Experiments. 

6 A Compositional Distributional Model of Meaning on Lambek Monoids 

The contribution of this paper is to extend the DisCoCat model from pregroups to Lambek monoids. 
Rather than pairing FVect with a monoidal category, we will rely on a functorial passage from a monoidal 
category to FVect. There are two reasons to opt for a functorial passage. Firstly, it makes a closer con- 
nection to the original semantic models of natural language BTI . which were based on a homomorphic 
passage from sentence formation rules to set theoretic operations. In our model, this homomorphic pas- 
sage is formalised via a functor between categories of grammatical reductions and meaning. Contrary to 
those models [41], however, our model is not limited to sets and set-theoretic operations and is gener- 
alised to vectors and vector composition operations. This brings us to our second reason: a homomorphic 
passage to the category of vector spaces is not a one-off development especially tailored for our purposes. 
It is an example of a more general construction, namely, a passage long-known in Topological Quantum 
Field Theory (TQFT). This general passage was first developed in [4] in the context of TQFT and was 
given the name 'quantisation', as it adjoins 'quantum structure' (in terms of vectors) to a purely topo- 
logical entity, namely the cobordisms representing the topology of manifolds. Later, this passage was 
generalised to abstract mathematical structures and recast in terms of functors whose co-domain was 
FVect 121 H9|. This is exactly what is happening in our semantic framework: the sentence formation 
rules are formalised using type-logics and assigned quantitative values in terms of vector composition 
operations. This procedure makes our passage from grammatical structure to vector space meaning a 
'quantisation' functor. Hence, one can say that what we are developing here is a grammatical quantum 
field theory for Lambek monoids. The detailed constructions of this paper can be worked out in a similar 
fashion for the pregroup-based framework of previous work |[T3l [141 . 

6.1 Monoidal Bi- Closed Categories 

A Lambek monoid is a monoidal bi-closed category. Similar to the case of pregroups, its objects are 
the elements of the partial order and its morphisms are provided by the ordering relation. The monoid 
multiplication is a non-symmetric tensor, whose residuals are the left and right implications. To see this, 
we recall some definitions. 

A monoidal bi-closed category is a category with a monoidal tensor ® and its unit /, such that for all 
pairs of objects A, B of the category, there exist a pair of objects A -° B, A °- B and a pair of morphisms 
as follows: 

ev l AB : A ® (A -° B) ^> B ev r AB : (A^ B)®B^ A 

These morphisms are referred to as left and right evaluations. They are such that for every pair of arrows 
/: (A ® C) — > B and g: (C ® B) — > A there exist two unique morphisms as follows: 

A l (f) :C^A^B A r (g) :C->Ao-B 
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These morphisms are referred to as left and right currying; they make the following diagrams commute: 



A <g> C — A ® (A -o B) 



ev 



A,B 



B 



A r (g) ® 1 B 

C®B — — * {A<^B)®B 



ev 



A,B 



Given a morphism f: A — > B, its name r / n : / 



A'(/) 



B, is obtained by currying the morphism 



(A ® /) — — »■ B (dropping the precomposition with the l a morphism, i.e. A® I lA » A). In order to be 

coherent with the above left-right notation, we define a left and a right name for /, obtained by currying 

/ / 
the morphisms (A ® /) ► B and (7 ® A) — — - B, as follows: 



rpr.j^B^A 



Evaluating these names makes the following diagrams commute: 

rpl 



A® I 



A ® (A -o B) 



I® A — i (S o- A)® A 



ev 



A,B 



B B 
In other words, we have the following two equations: 

ev l AB o{\ A ®^p l ) = f ev AB o^p r ®\ A ) = f 

The names of the identity morphism 1a : A — > A are obtained as special cases of the above construction. 
These are defined as follows: 



'U nl : I 



A'(U) 



A^A 



A^A 



Evaluating the above leads to two similar commutative diagrams; the equations corresponding to these 
diagrams provide us with the monoidal version of yanking: 



ev' AA o(l A ®^ 1 ) 



h 



ev A , °( r lV r ®U) 



1, 



6.2 A Quantisation Functor for Lambek Monoids 



The quantisation functor preserves all the syntactic structure but "forgets" the order of the words. This 



order will be taken care of in our explicit definition of the meaning vector of a sentence in Definition 6.2 



Definition 6.1. Given a Lambek monoid X and the category of finite dimensional vector spaces FVect 
over R, the quantisation functor <3: £, — > FVect is a strongly monoidal functor, satisfying the following: 



<2(1) := R 
Q(a ■ b) = Q(b ■ a) 
Q(a -ob) = Q(b °- a) 



Q(a) ® Q(b) 
Q(a) => Q(b) 



(3) 
(4) 
(5) 
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The tensor product in FVect is symmetric, so (4) forgets the order and (5) collapses the two implications 
of X to just one. Also, FVect only has one evaluation and currying map as follows: 



ev A ,B-. A ® (A => B) -> B A(f) : C -> (A => B), for /: (A ® C) -> B 

The closed object V => W is the set of all linear maps from V to W, made into a vector space in the 
standard way, via the isomorphism V => W = V* ® W = V ® W. Hence, the closed structure of 
FVecf determines its compact structure. Given this fact, the evaluation map of FVect is the same as the 
corresponding co-unit of the compact adjunction. Hence, for V, W respectively spanned by {~v ,},-, (wj}j, 
and a vector ~~v € V, the morphism 

y®(y => w) w 

is as in the compact closed case, that is: 

< \\ 



evyw 



V 'J 



The quantisation of atomic types yields atomic vector spaces: for a noun phrase n we stipulate Q(n) := 
and for a declarative sentence s we stipulate Q(s) := S . The quantisation of compound types yields 
closed vector spaces. The quantisation of an intransitive verb, which has the type n -o s, is computed as 
follows: 

Q(n -os) = Q(n) => Q(s) := N^S 
The quantisation of an adjective, which has the type n °- n, is computed as follows: 

Q(n <^ri) s Q(n) => Q(n) := N 

The quantisation of a transitive verb, which has the type (n -° s) o- n, is computed as follows: 

Q{(n^s)^n) = Q(n) => (Q(n) => := N => (N => S) 

Monoidal meaning vectors of strings of words are computed by applying the quantisation of their gram- 
matical reduction map to the tensor products of the quantisations of their words. More precisely, we 
have: 

Definition 6.2. The monoidal meaning vector of a string W1W2 • • • w„ consisting of n words is: 



W1W2 • • • w,' := Q(f) (wt ® W2 ® • • • ® J 



where for the grammatical type of the word w,-, the map / is the monoidal grammatical reduction map 
of the string, that is t\ • t 2 t n s and Q(f) is defined as follows: 

Q(h -h t n s) := Q(h -t 2 t n ) Q(s) 

For example, the Q(J) of an intransitive sentence with f = n-(n-° s) — s, is computed as follows: 

Q \n • (n -o s) s\ = Q {n ■ (n -o s)) Q(s) 

s N ® (N => S) S 
The monoidal meaning of an intransitive sentence 'men kill' is as follows 

men kill := Q(J) |men ® killj 
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Given that a vector v in a vector space V is represented by the morphism / V, the details of this 

computation become as follows: 

n^TkiU - ev w ((/^iv) ® (/ -i(iV=>5)jj 

= ev NiS (l®I^^N®(N^S) 

T men®kill ,„ _ . evjvj 
S / + N®(N^>S) -5 

evN z s (men®kill) 

This morphism picks a sentence vector from S ; given that the monoidal and compact meanings of the 
verb 'kill' coincide and both are representable by Y>ij c ij n i ® */> the above morphism picks the vector 
Yiij Cij (men | Ij,-) j}, which is the same vector as the compact meaning vector of 'men kill'. 

For the transitive sentence, the grammatical reduction map is the composition of two maps as follows: 



n ■ {{n -o s) o- n) ■ n «- n • {n -o s) s 



Hence, / is the composition ev l ns o (\ n -ev r nn ^ s ). The quantisation of this composition is the composition 
of quantisations, computed as follows: 

Q(ev l n s ( ! « - ev n,n-oS) = <3(^i,,) o Q(\ n ■ ev r n n ^ by functoriality of <3 

= ev N j ° (<3(l n ) ® Q(eVrvi-os)) b y monoidality of Q 

® ev N ^ s ) by functoriality of Q 

= evN,s ° (In ® evN,N^s) 
And based on the above, Q(f) is computed as follows for a transitive sentence: 

<2 « • ((« -o j) o- «) • w s\ = N®(N ^> (N ^> S))®N 



The monoidal meaning vector of a transitive sentence such as 'men kill dogs' simplifies to the following: 
men kill dogs := (ev/v,s (Ijv ® zvn,n=>s)) {men ® kill ® dogsj 

Following a computation similar to that of the intransitive sentence, the above picks the sentence vector 
Tiijk c ijk (men | rii)(nk I dogs) « 7 from 5; this vector is the same vector as the one obtained in the 
compact closed case. 

For an adjective-noun phrase, / is (n o- n) ■ n — ^ n, hence Q(f) becomes as follows: 

Q({n^n)-n — nj = (N^N)®N^~N 

whose concrete vector is the same as in the compact closed case, that is Yuim c im (noun | ~n /) ~n m . 

For a sentence with an adjective-noun phrase, such as 'men kill cute dogs', Q(f) is computed as follows: 

,-,/,, . . . . ev^ s o(l„-ev;„^ J )o(l„. ( „^„ ) ^ s .e V ; i „) \ 
Q \n • ((n -o s) o- n) ■ (n o- «) • « ► j = 



onn ,tn »r ev iv,s o ( 1 w'Siev J v,jv^s)o(l J v0(w=>(;v=>S)) | 8i«v A ,,iv) 

N ® (AT => (N => S))®(N => N)®N — — — — *■ 
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The computation of the monoidal meaning vector of the above is done by substituting the meaning vector 
of the adjective-noun phrase for the meaning of the object in the meaning vector of the transitive sentence. 
It provides us with the same meaning vector as in the compact closed case. 

To compute the monoidal vector meaning of the sentence 'men do not kill cute dogs', we take 'do' to 
be the name of the identity morphism on N => S and 'not' to be the name of not on the morphism 
(N => 5) -» (N => S). That is: 

do" := r \ N ^, s ^:R^((N^S)^(N^>S)) 
not := r not n : R -» {{N => S) => (N => S)) 

The meaning vector of the sentence will then simplify to the following: 



ev N,s [men ® not(M/(-, dogs)) 



So the monoidal meaning of a negative sentence is obtained by first applying the meaning of the verb to 
the object, then negating it, then applying the result to the subject. This procedure is different from the 
compact closed case, which was obtained by first applying the meaning of the verb to the meanings of 
subject and object, then negating it. The monoidal meaning results in the following vector: 



(|men | not ^ c ijk ~ni(n k I dogs) ~s j ^ 



Whereas the compact meaning results in the following vector: 

™* 2j c W | ~ni)(n k I dogs) ~~s j 

, ijk 

These two vectors only become equivalent in special cases, for instance one in which the meaning of 
not can only act on S (and not on N). This happens in the truth theoretic case, in which N has many 
dimensions, S has only two dimensions, and not swaps the basis vectors, that is, it is the linear map 

corresponding the the matrix (J ^ j. In this case, not can be applied to any vector in 5 , but it is not 

defined for the vectors in N. If not is also defined for N, for instance it is a certain permutation of basis 
vectors, then in the computation of the monoidal meaning vector of the sentence, it can either apply to 
the basis elements of N, that is to n ,-, or to the basis elements of S , that is to s j. Only the latter will 
provide a result which is the same as the meaning vector of the sentence in the compact case. 



7 Diagrammatic Reasoning in Monoidal Bi-Closed Categories 

One of the principal advantages of a DisCoCat is that it is equipped with the sound and complete dia- 
grammatic calculus of compact closed categories [27 ]. We refer the reader for further details to previous 
work, but in a nutshell, this diagrammatic calculus depicts the information flows that happen among the 
words of a sentence and which produce a meaning for the full sentence. This information flow happens 
when certain objects cancel out via yanking; this procedure is depicted by pulling sequences of connected 
cup and cap structures, representing e and n maps, and turning them into straight lines. If the informa- 
tion flow happens in stages, the process is depicted via equivalences between the resulting diagram of 
each stage. The equivalences translate into equations between categorical morphisms corresponding to 
each diagram and provide a symbolic proof of the claim that for instance, a sentence is grammatical. 
The yanking operations have also been useful to elegantly describe the flow of information in quantum 
protocols such as teleportation, as depicted in Figure |2| from 
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Figure 2: Diagram of Information Flow in the Teleportation Protocol 




Figure 3: Diagram of Information Flow in the Negative Transitive Sentence 

In a linguistic context, the yanking operations depict the flow of information between words, as depicted 
in Figure g) from |[T3l 1521. 

The ability to compute with diagrams is not lost in the move from compact to monoidal bi-closed cate- 
gories. A diagrammatic calculus for these categories has been developed by Baez and Stay Similar 
to the graphical calculus of compact closed categories, the graphical calculus of monoidal bi-closed cat- 
egories can be applied to both depict the grammatical reduction of Lambek monoids and the flow of 
information between meanings of the words in a sentence. 

The basic constructs of the diagrammatic language of a monoidal bi-closed category are shown in Figures 
[4]and[5] We read these diagrams from top to bottom. Objects of the category are depicted by arrows. For 
instance the left hand side arrow of Figure |4] depicts object A. Morphisms of the category are depicted 
by 'blobs' with 'input' and 'output' arrows standing for their domain and codomain objects. The right 
hand side blob of Figure [4] depicts morphism / which has domain A and codomain B. Since an identity 
morphism has the same domain and codomain, it is depicted in the same way as the object corresponding 
to it. The identity morphism id& is denoted by the same diagram as that of object A, e.g in the left hand 
side diagram of Figure pfl 



id a 



/ : A — » B 
A 

CD 

B 



Figure 4: Basic Diagrammatic Language Constructs (1) 



The tensor product of two objects corresponds to the side by side placement of their arrows, for instance 
the tensor product A ® B of two objects A and B is depicted by putting the arrow depicting object A 
beside the arrow depicting object B, as shown in the left hand side diagram of Figure [5] Right and left 
implications A -o B and A o- B are depicted by side by side placement and clasping of their arrows, 
pointing in opposite directions. For instance, the right implication A -o B is depicted by putting the 
arrow corresponding to A pointing upwards with the arrow corresponding to B pointing downwards; 
with the arrow of B being clasped to the arrow of A. The intuition behind the directions of the arrows 
is that in a right implication, the antecedent has a negative role and the consequence has a positive role; 
this reversal of roles is the reason behind the classical logical equivalence A — > B = -A V B. The clasp is 
a restriction: it is there to make us treat the implication as one entity and to prevent us from treating each 
of the arrows separately. As a result, for example, we cannot apply a function to one arrow, e.g. A and 
not the other, e.g. B. The diagram corresponding to the left implication A o- B is the opposite of that of 
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the right implication. These two implications are depicted in the last two diagrams of Figure [5] 



B 



\B 



B 



Figure 5: Basic Diagrammatic Language Constructs (2) 



There might be more than one way of representing an object or morphism in this diagrammatic calculus. 
If so, these different ways are considered to be 'equal' and their resulting representations are considered 
to be 'equivalent'. Figure|6]shows three of the main Baez-Stay diagrammatic equivalences. They express 
the fact that one can either draw the left hand side or equivalently the right hand side diagram to represent 
their corresponding morphism. For instance, one can either draw an arrow labeled with A ® B for the 
tensor object A ® B (or the identity arrow on it, i.e. idU®fi) or two side -by-side arrows, one labeled with 
A and the other with B. These equivalences are not the same as the equivalences of other diagrammatic 
calculi Il55l . For instance, there is no connection between these and the coherence conditions of the 
category. 



id 



A®B 



B 



id, 



A~oB 



B 



4 

\B 



B 



B 



Figure 6: Equivalent Diagrammatic Representations 

The equivalences for the left and right evaluations ev 1 and ev r are shown in Figure|7] Each of the diagrams 
of each equivalence input two arrows and output one arrow. The left hand side diagrams do not depict any 
information flow: they just show which objects the evaluation morphism is being applied to. The internal 
structure of the evaluation morphism is being depicted inside the blob of the right hand side diagrams. 
These diagrams depict how the information flows between an object and the implication, leading to an 
output. The flow represents the fact that, for instance, for the tensor product A ® (A -o B) to be evaluated 
and to produce the output B, information has to flow from object A to the antecedent of A -o B, and 
similarly for the left implication. 

ev l AB : A® (A -o B) -> B ev r AB : (B °- A) ® A -> B 




Figure 7: Diagrams for Left and Right Evaluation 



The diagrams for left and right currying are shown in Figure [8] Here, the left hand side diagrams are the 
morphisms that are being curried and the right hand side diagrams depict the flow of information that 
happens in the process of currying. What happens outside the blob of the right hand side diagram has 
no flow and just lists the objects to which the currying morphism is being applied. The internal structure 
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of the blobs depict the corresponding information flows. For instance, the diagram of A l (f) shows that 
in order to produce the morphism C — > A -o B, the information encoded in A and C have to flow to 
/: (A ® C) — > B. For this to happen, first A has to detach itself from the clasp A -o B, a process that can 
only happen inside a blob. 



/ : A ® C -» B A l (f) :C^A^B 



CD 



B 




C®B^ A 
B 



C 

C 



A r (g) : C -4A0- B 
C 




Figure 8: Diagrams for Left and Right Currying 



The diagrams for the equivalences corresponding to the evaluation of the names of morphisms, which 
lead to yanking, are shown in Figure [9] (note that in these diagrams we have stretched the caps inside the 
outer blobs for reasons of space). They show how, if we first curry / to produce A -o B, then evaluate 
the result with A, the output will be the same as the output of applying / to A, that is B. 




Figure 9: Diagrams for Left and Right Yanking 



8 Monoidal Bi-Closed Diagrams and Flow of Meaning in Language 

The diagram of the intransitive sentence 'men kill' consists of just one evaluation: ev l „ s . It shows how 
information has to flow from the subject 'men' to the verb 'kill' to form the intransitive sentence 'men 
kill' . In other words, the meaning of an intransitive sentence is obtained by applying its verb to its subject. 



The diagram for this application is depicted in Figure 10 For reasons of space, we use lowercase letters 
for both the grammatical types and their interpretation. 

For the transitive sentence 'men kill dogs', we first depict the subtype n -o s of the verb with a single 
arrow and do an ev r n _ osn with the object. Then we use the equivalence for the right implication and 
replace the resulting object n -o s with its equivalent clasped diagram which has two arrows of types 
n and s respectively. The replacement procedure is marked with a dotted line. At this point, we do an 
ev n s on the implication diagram n -o s and the subject n. The full diagram in Figure 
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shows that 

to obtain a meaning for a transitive sentence, information has to flow in two steps: first from the object 
'dogs' to the verb 'kill' and then from the subject 'men' to the resulting verb phrase 'kill dogs'. Note that 
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Men kill dogs 
n n -o n n 



Men kill 




Yl — o 


,0 


« -o 








^- — 








Figure 10: Diagram for 'men kill' and 'men kill dogs' 



this information flow was done in one step in the compact closed setting, where the subject and object 
interacted with the verb simultaneously and in one step. Whereas in the monoidal case, we have to wait 
for the verb to interact with the object before it can interact with the subject. 

Men kill cute dogs 

n in — o s) o— n no— n n 




Figure 11: Diagram for 'men kill cute dogs' 
The diagram for the meaning of 'men kill cute dogs' is obtained by three evaluations, as depicted in 



Figure 1 1 Here, first the adjective acts on the object, then the verb acts on the resulting adjective-noun 



object to produce a verb phrase, which is then applied to the subject to produce a sentence. The corre- 



sponding diagrammatic computations are depicted in Figure 12 The left hand side diagram shows the 
verb phrase part of this interaction, whereby, 'cute' is applied to 'dogs' and then input to 'kill' as its ob- 
ject. The result is then applied to the meaning of 'men' , this third application is depicted in the right hand 
side diagram, where the dotted area stands for the general structure of the first and second applications 
and serves as a shorthand for the left hand side diagram. The arrow between the two diagrams shows 
that the bottom left side of the first diagram is being inserted into the right hand side and acted upon. 

The words whose meaning vectors are names of linear maps are obtained by currying the corresponding 
linear map. For instance, the meaning vector of the auxiliary 'do' is the name of the identity and the 



meaning vector of the negation word 'not' is the name of a negation operator, as specified in Section 6.2 



The diagrams corresponding to the meaning vector of words that are names of certain morphisms are 



obtained by currying those morphisms. The diagrams of 'does' and 'not' are depicted in Figure 13 



The meaning vector for 'men do not kill dogs' is obtained by substituting the diagrams of meanings of 
'does' and 'not' into the diagram corresponding to the monoidal grammatical reduction of the sentence. 
The result is depicted in Figure[l4j Here, the diagram on the right is the simplified version of the diagram 
on the left, where 'kill' has been applied to 'dogs' and the result has been inputted to 'not'. This means 
that the result is being negated and the negation is then being applied to the meaning of 'men'. As 
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kill cute dogs 

(n —o s) o— n n o— n n 



Men kill cute dogs 

n (n — o s) °— n n °— n n 



n — o s 



n —o s 










n 


n 






Figure 12: Diagram for the Meaning of 'men kill cute dogs' 



does 



not 



n -o s 




n -o s 



n -o s 




n -o s 



Figure 13: Diagrams for 'does' and 'not' 



mentioned in Section 6.2 this procedure is different from the compact closed case and this difference, 
which basically lies in the order of applications, is manifested in the corresponding diagram of each case. 

If Lambek monoids are interpreted in vector spaces, as demonstrated in the Section |6.2[ they benefit 
from their extra compact structure, for instance symmetry of the tensor. Baez-Stay diagrams can easily 
be turned into compact string diagrams: by removing the clasp restrictions and popping the bubbles. 
There is, however, a problem: the resulting diagrams are not always the same as the compact diagrams 
that were originally drawn for sentences, as demonstrated in the above example. The problem stems 
from the restricted form of yanking in the monoidal case. 



9 Concluding Remarks and Future Work 

We have shown that the DisCoCat framework lfT3l[T4l need not be committed to the grammatical formal- 
ism of Lambek pregroups 11331 , which are compact closed. Lambek monoids 11311 , which are monoidal 
closed, also allow a functorial passage to the category of finite dimensional vector spaces and the dia- 
grammatic calculus of Baez and Stay |6| can be used to depict the information flow that happens within 
the sentences. This functorial passage is used in and referred to as quantisation in the context of Topo- 
logical Quantum Field Theory (TQFT) |4l|5j|29l. As future work, we would like to (1) extend the results 
of this paper to richer type-logics, and (2) relate their diagrammatic calculi to Baez-Stay diagrams. 

With regard to (1), we would like to develop distributional compositional models of meaning for richer 
grammars such as the Combinatorial Categorial Grammar ll57l and Lambek-Grishin algebras Il44l 151. 
These grammars have more expressive power and can formalise larger fragments of natural language, 
and in particular English; CCG has been applied to parse large corpora of real data. To extend our quan- 
tisation functor from Lambek monoids to CCG, the cross composition rules need to be given a categorical 
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Men 




Men not(kill(-, dogs)) 

6- 




Figure 14: Diagram for Meaning of 'men do not kill dogs' 



semantics. This would perhaps require restricting the application of these rules to subcategories of the 
full type-category, as in their general forms, they do not hold either in a compact or a monoidal closed 
category. To extend the quantisation functor to Lambek-Grishin algebras, we need to ask the functor to 
preserve the operations of the Grishin part of the algebra, as well as their interactions with the Lambek 
part, hence defining functors that have to satisfy more than the monoidal conditions; the categorical prop- 
erties of these functors should be studied. Whether or not such extensions prove to be trivial remains 
to be seen, but it is certain that augmenting the applicability of this general compositional categorical 
formalism will greatly assist its acceptance by the linguistic community as a practical theoretical toolkit. 

As for (2), note that the general structure of Baez-Stay diagrams are the same as parse trees. This is not 
surprising given the Curry-Howard isomorphism between monoidal categories and lambda calculus |58l]. 
Whereas parse trees are purely syntactical and only depict the grammatical structure of sentences, Baez- 
Stay diagrams have extra information in their nodes (blobs) about the flow of meaning and the semantic 
structure of phrases. The straight lines and the cups allow for a flow between the nodes they are con- 
necting; this encodes the order and details of function applications. The clasps, however, stop this flow 
from happening, hence making it explicit which applications cannot happen. One can identify the clasps 
with lines and remove the cups to obtain a parse tree without this information. We have demonstrated 



this procedure for an example sentence in Figure 15 here, VP stands for an intransitive, transitive, or 
ditransitive verb phrases, and NP stands for a noun phrase. 

The extra information in the blobs of Baez-Stay diagrams makes them more like proof nets. A proof 
net is a diagrammatic notation used to depict the grammatical structures of sentences and their lambda 
calculus meanings. Proof nets unify proofs, hence solve the spurious ambiguity problem and have been 
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Figure 15: Baez-Stay Diagrams and Parse Trees 



extensively studied by the linear logic [17] and linguistics communities l54l l46l 1421 . They contain 
semantic information about sentences, represented by traverse instructions that describe how to form 
the lambda term corresponding to a grammatical reduction. These lambda terms can also be directly 
read from the sequent calculus proofs corresponding to the proof nets. An abstract form of proof nets, 
referred to as proof structures, encode semantic information about sentences to start with. These come 
equipped with certain contraction rules and can be rewritten to parse trees using these rules. The rewrites 
model the computations that provide us with the final meaning of the sentence. A formal connection 
between Baez-Stay diagrams and proof nets or structures constitutes future work. As one of the referees 
suggested, one possibility would be to develop a lambda calculus for Baez-Stay diagrams in the style of 
glue semantics lfT6ll . 
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